Metagenomic surveillance for bacterial tick-borne pathogens using nanopore adaptive sampling

Technological and computational advancements in the fields of genomics and bioinformatics are providing exciting new opportunities for pathogen discovery and genomic surveillance. In particular, single-molecule nucleotide sequence data originating from Oxford Nanopore Technologies (ONT) sequencing platforms can be bioinformatically leveraged, in real-time, for enhanced biosurveillance of a vast array of zoonoses. The recently released nanopore adaptive sampling (NAS) strategy facilitates immediate mapping of individual nucleotide molecules to a given reference as each molecule is being sequenced. User-defined thresholds then allow for the retention or rejection of specific molecules, informed by the real-time reference mapping results, as they are physically passing through a given sequencing nanopore. Here, we show how NAS can be used to selectively sequence DNA of multiple bacterial tick-borne pathogens circulating in wild populations of the blacklegged tick vector, Ixodes scapularis.

www.nature.com/scientificreports/ NAS operates through an advanced bioinformatic pipeline that compares nucleotides of individual molecules (~ 200-400 base pairs) against a user-specified reference file every ~ 0.4 s, as sequencing is occurring. This dynamic method has a wide number of applications and can selectively enrich targets of interest (e.g., select genes, chromosomes, mitogenomes, RNAs, etc.) and reject unwanted molecules throughout a given sequencing run 23,25 . NAS can be leveraged for real-time pathogen surveillance and discovery using two approaches. First, depletion of host genomic DNA or RNA sequences can be accomplished in real-time by supplying a host reference genome (e.g., human, bat, rodent, primate, mosquito, tick) and rejecting host reads (Fig. 1B). The resulting sequence data is enriched for the entire metagenomic community and then mined for de novo surveillance of putative pathogens. Second, a reference file containing whole genomes or select genes (e.g., phylogenetically informative loci, antimicrobial resistance genes) of all known pathogens of interest (including both viral and bacterial targets in a single file) can be provided (Fig. 1A). If present within a sample, sequences with a minimum of ~ 70% sequence similarity to the reference will be retained or rejected as specified by the user 23 . This aspect of NAS distinguishes it from all other sequencing methods, as the process can detect extremely small proportions of nonhost DNA/RNA (e.g., low abundance viral and/or bacterial pathogens intermixed with > 99% host molecules) 24,26 .
A critical observation with respect to NAS is that nanopore-based sequencing routinely produces DNA/RNA reads thousands of bases long. Thus, the metagenomic data generated by NAS increases confidence in resulting surveillance findings and taxonomic identification as entire viral or bacterial genomes can be recovered. For these reasons, NAS-based surveillance can potentially be used for de novo identification of pathogens via unbiased metagenomic/metatranscriptomic enrichment or through the capture of novel strains based on sequence similarity shared between suspected agents and those for which genomic resources are available.
To evaluate the utility of NAS for pathogen surveillance, we attempted to detect and characterize tick-borne pathogens (TBPs) vectored by the blacklegged tick, Ixodes scapularis (Acari: Ixodidae). Globally, ixodid ticks include some of the most significant disease vectors of importance to both human and animal health, and are capable of transmitting a wide diversity of bacterial, protozoal, and viral pathogens to their vertebrate Figure 1. Comparison of nanopore adaptive sampling (NAS) and control sequencing methods. During NAS, real-time mapping of reads against a user-specified reference allows investigators to target desired genomic regions for sequencing enrichment or depletion. Here, we used select whole genome assemblies to enrich for a select assemblage of tick-borne pathogens (A), resulting in the rejection of unmapped, non-target reads from the sequencing nanopore. Alternatively, we used the genome assembly of the I. scapularis tick vector for a depletion approach in which all mapped host DNA sequences were rejected (B), leading to the retention of any unmapped, non-host reads. A control sequencing experiment was also performed without targeted enrichment or depletion and in which all molecules present in the library were sequenced to completion. Map  www.nature.com/scientificreports/ hosts. As a group, TBPs serve as favorable targets to understand how NAS can be used for sequence-based pathogen surveillance. Recent studies have highlighted the complex bacterial, viral, and eukaryotic metagenomic communities maintained by tick species and populations, which harbor an abundance of both pathogenic and non-pathogenic microorganisms 22,[27][28][29][30][31] . Frequently, individual ticks may be co-infected with multiple TBPs, the epidemiological and clinical consequences of which are still poorly understood 27,32,33 . In the United States, I. scapularis represents the vector species of greatest public health importance 34 . While I. scapularis is best known for its role in the transmission of Borrelia burgdorferi s.l.-the causative agent of Lyme disease-it is increasingly evident that the species is also responsible for transmitting numerous other TBPs. In addition to B. burgdorferi s.l., at least six other human pathogens are currently recognized to be vectored by I. scapularis, including the causative agents of Borrelia miyamotoi disease (caused by a distinct spirochete in the relapsing fever clade), anaplasmosis, ehrlichiosis, babesiosis, and Powassan virus encephalitis [34][35][36][37] . A complicating factor in the control of tick-borne diseases has been the dramatic expansion of I. scapularis populations over the past decades, which has coincided with increasing I. scapularis vector densities and rates of human-tick contact throughout many regions of North America 34, [38][39][40] . As a result, the reported incidence of tick-borne diseases in the U.S. over the past two decades has more than doubled 38 . Despite these trends, current methods for TBP surveillance are limited in their reliance on conventional PCR and ability to detect only a single or limited range of TBPs. To evaluate the potential of nanopore sequencing-in combination with the NAS strategy-for TBP surveillance, we attempted to detect and characterize TBPs from field-collected I. scapularis ticks from Minnesota and Wisconsin, USA, two states with a high burden of tick-borne disease 27,28,36,41,42 .
In this work, we leveraged NAS for the detection of four bacterial TBPs: B. burgdorferi s.s.; B. miyamotoi; Anaplasma phagocytophilum; and Ehrlichia muris eauclairensis circulating in wild-caught I. scapularis. We demonstrate that the NAS method successfully recovered TBP-derived sequences using both enrichment and host depletion strategies, resulting in the sequencing of entire bacterial pathogen genomes. Importantly, our NAS results regarding pathogen status were in full agreement with independently generated Illumina 16S microbiome data. Additionally, adaptive sampling through both enrichment and depletion approaches allowed for clear differentiation between closely related bacterial TBPs (e.g., the Lyme disease agents B. burgdorferi s.s. and B. mayonii, and the relapsing fever agent B. miyamotoi), highlighting its potential utility as a novel approach for metagenomic pathogen surveillance across a variety of arthropod vector species.

Results
Tick sampling and sequencing strategies. We isolated total genomic DNA from eight adult female I. scapularis specimens collected at different localities across Minnesota and Wisconsin, USA (Fig. 1). The V4 region of the bacterial 16S rRNA gene was amplified across all eight samples, and microbiome sequencing using the Illumina MiSeq platform revealed that all eight ticks were infected with Borrelia (Borreliella) burgdorferi s.l. Five of the eight ticks exhibited evidence of co-infection with an additional bacterial tick-borne agent. In total, our 16S Illumina sequencing suggested the likely presence of four bacterial TBPs: A. phagocytophilum, B. burgdorferi s.l., B. miyamotoi, and E. muris eauclairensis across our I. scapularis samples ( Table 1). The total number of classified 16S reads varied by sample and TBP-ranging from 1097 to 34,489-suggesting wide variability in the relative abundance of each bacterial pathogen in its respective tick host.
Next, barcoded nanopore libraries were generated using genomic DNA from the same eight I. scapularis ticks. We first set out to evaluate the performance of the NAS enrichment approach (Fig. 1A) for targeted TBP detection and prepared two identical nanopore sequencing libraries. In these libraries, four samples were barcoded (Tick 1-Tick 4), pooled, and nanopore sequencing adapters were ligated; both final sequencing libraries contained between 140 and 150 ng of DNA. The paired libraries were then sequenced on MinION R9.4 flow cells over a 24-h runtime. During the sequencing of one library, NAS was enabled through a reference file containing whole genome (and plasmid) sequences of both established and unlikely Ixodes-transmitted TBPs was provided for www.nature.com/scientificreports/ adaptive sampling enrichment (Supplementary Table S1). The second of the paired libraries was sequenced without NAS as a control. The library sequenced with adaptive sampling yielded a total of over 5.5 million reads and 1.9 Gb of sequence data. A comparable amount of total sequencing output was obtained from the control library, with 1.8 Gb of sequence data generated (Table 2). A notably shorter average read length was obtained from the NAS enrichment sequencing experiment in comparison to the control (359.0 bp and 627.3 bp, respectively). This difference in average read length was anticipated and likely due to the retention of a significant number of unmapped I. scapularis reads in which the initial 200-400 bp were sequenced prior to being rejected from the nanopore. Overall, both flow cell performance and total sequencing output were comparable between paired NAS enrichment and control sequencing experiments; however, a steeper decrease in active nanopores was noted during NAS enrichment, likely due to increased pore stress caused by the frequent rejection of unmapped DNA molecules (Supplementary Fig. S2-S5).
Evaluation of NAS with enrichment for tick-borne pathogens. Reads were basecalled in realtime using the fast basecalling model in guppy (i.e., basecalling speed is prioritized over read accuracy). Realtime basecalling is a notable requirement for NAS, as rapidly basecalled reads are needed for alignment to the reference sequences used for either enrichment or depletion. In total, our paired NAS enrichment and control libraries yielded a total of 4216 and 2631 successfully mapped reads, respectively. The real-time mapping results we obtained were in full agreement with our previous 16S Illumina MiSeq findings in terms of TBPs detected by sample. Importantly, real-time alignment also enabled us to distinguish between the closely-related B. burgdorferi s.l. Lyme Disease Group spirochetes (i.e., B. burgdorferi s.s. and B. mayonii) to confirm that all four ticks were infected with B. burgdorferi s.s., a level of taxonomic resolution not attainable from our 16S Illumina MiSeq data. Following the conclusion of each sequencing experiment, we conducted post-hoc basecalling of raw nanopore Fast5 signal data using a high accuracy basecalling model to improve the per-base accuracy of generated reads. After quality filtering and extraction of reads mapping to our target TBPs, we observed that NAS achieved roughly two-fold total read enrichment over our control sequencing library (Fig. 2). This roughly two-fold enrichment was observed to be consistent across all four tick samples and individual TBPs sequenced over the NAS enrichment experiment. Similarly, the total number of successfully mapped bases indicated a similar level of sequencing enrichment when NAS was enabled in comparison to our control ( Fig. 2). During these analyses, we also noted that numerous reads for both sequencing runs mapped to the genome of the eukaryotic pathogen, Babesia microti; however, further characterization of these reads suggested that they derived from repetitive elements within the I. scapularis genome and did not suggest active infection. For this reason, we chose to focus our remaining analyses on characterizing bacterial TBPs only. Total reads were also processed using the kraken2 package, an alternative minimizer-based classification tool independent of our previous mapping and characterization using the minimap2 package for real-time TBP genome alignment 43,44 . This approach for read classification provided an additional and independent assessment of taxonomic diversity found within each I. scapularis sample. The output from these analyses supported our previous findings in terms of which bacterial TBPs were detected and the level of sequencing enrichment achieved using NAS (Table 3). Overall, the relative numbers of nanopore reads classified using both the minimap2 www.nature.com/scientificreports/  www.nature.com/scientificreports/ approach and kraken2 were comparable and consistent between samples and sequencing experiments. The kraken2 pipeline was also used to provide additional confirmation of the presence of B. burgdorferi s.s. in our samples. Notably few reads from our samples were classified as B. mayonii, while > 100 reads were classified as B. burgdorferi s.s. After concluding that sequencing via NAS achieved a notable level of TBP sequence enrichment over our control, we assessed where reads mapped to corresponding bacterial TBP genomes. We extracted NAS mapping coordinates for all individual mapped reads and then plotted these across the complete genomes of the corresponding TBPs (Fig. 3). These findings demonstrate that sequencing via NAS resulted in a substantial amount of long-read sequence data with numerous bacterial TBP sequences in excess of 5 kb. Additionally, we demonstrate that pathogen-derived NAS sequences spanned the assembled genomes, at varying depths of coverage, of each TBP.
Performance of NAS using a host depletion approach. To compare the efficacy of NAS for both enrichment of pathogen targets and depletion of host genomic DNA, an additional library containing four additional I. scapularis ticks (Tick 5-Tick 8) was produced and sequenced. As above, sequencing was done using a new MinION R9.4 flow cell and post-hoc basecalling of raw signal data was conducted using a high accuracy basecalling model. During sequencing of this library, adaptive sampling was enabled with specifications for the depletion of the complete I. scapularis genome (Fig. 1B). In addition to the host tick genome, the whole genome of Rickettsia buchneri was also provided for depletion, as this non-pathogenic bacterial endosymbiont is frequently found at an overwhelming abundance in I. scapularis ticks 45,46 . Overall, we observed similar output measurements following this depletion experiment as was seen during NAS enrichment ( Table 2). Across all ticks sequenced through NAS depletion, a short average read length was again observed, as an abundance of www.nature.com/scientificreports/ reads which successfully mapped to the I. scapularis or R. buchneri genomes were rejected from the sequencing nanopores and in which the first few hundred base pairs of data were retained. Following a similar approach as was taken for our NAS enrichment and control experiments, reads from each tick generated during this depletion run were characterized using minimap2 and the kraken2 packages. Across these four ticks sequenced by NAS depletion, we again achieved full concordance with our Illumina 16S data in terms of which TBP taxa were detected (Table 3). In addition, we noted numerous similarities between our NAS enrichment and depletion experiments. Although the total number of reads that were successfully classified to a particular TBP showed greater variability in our NAS depletion experiment, both resulted in similar numbers of successfully mapped and classified reads and showed similar success at differentiating between related TBPs (i.e., classification of reads clearly demonstrated that ticks were infected with B. burgdorferi s.s. and not B. mayonii or B. miyamotoi) (Fig. 3). In comparison to the control, data generated over both NAS enrichment and depletion demonstrated substantially fewer long-read data mapping against the bacterial endosymbiont, R. buchneri, suggesting that either NAS approaches performed comparably at depleting sequencing output for this non-target organism (Fig. 4).

Discussion
Surveillance of vector arthropods for pathogens is an important part of the public health repertoire for combating vector-borne diseases. The U.S. Centers for Disease Control and Prevention have stated that "new tools for preventing tick-borne diseases are urgently needed…" 47 . Nanopore sequencing allows for rapid detection of pathogens with minimal sample preparation and reagent requirements, and can be performed across a variety of field conditions. This facet of the technology brings the possibility of arthropod-borne pathogen surveillance sequencing to groups and regions without ready access to laboratory or internet resources. In places with high rates of endemic vector-borne disease, local public health organizations can use nanopore sequencing to rapidly determine the prevalence of pathogens in vector species without needing to rely on the testing facilities of state or national laboratories. Moreover, unlike traditional PCR which allows for pathogen detection by amplifying short regions of template DNA, nanopore sequencing produces long reads that provide substantially more genomic information. As such, nanopore sequencing will see utility in the differentiation of pathogenic and non-pathogenic strains within the same species. For example, human infectious A. phagocytophilum strains can be rapidly differentiated from non-human infectious ungulate strains without requiring nested PCR 48 . Genetically similar species, such as spotted fever group rickettsiae (some of which are not human infectious), will be rapidly buchneri. Total R. buchneri-mapped reads generated over NAS depletion, NAS enrichment, and control sequencing experiments are presented as points by read length over a logarithmic scale. Over both NASdepletion (green) and NAS-enrichment (orange) sequencing strategies, the read length distribution shows a greater proportion of reads < 1000 bp, indicative of real-time rejection of non-target R. buchneri reads during each sequencing experiment. Conversely, during the control experiment (blue), an increased proportion of reads between 1000-10,000 bp is observed, as all non-target R. buchneri reads present in the library were sequenced fully. www.nature.com/scientificreports/ differentiable in the field rather than relying on PCR amplification followed by sequencing or restriction fragment length polymorphism techniques, as is currently done 49,50 . The recent development of adaptive sampling on ONT sequencing platforms offers even greater potential for sequence-based applications in arthropod vector surveillance. NAS can be used to selectively sequence DNA or RNA molecules associated with entire pathogen genomes via NAS enrichment. Using NAS for such a targeted enrichment approach, investigators can target a wide range of pathogen-derived nucleic acids which may be substantially less abundant in their library in comparison to those of either the arthropod host or other microbes. Using NAS, investigators can also choose to rapidly deplete arthropod host-associated sequences to generate sequence data for entire metagenomic communities and any vector-borne pathogens contained therein. As we demonstrate here, NAS can also be leveraged for the simultaneous depletion of both host reads and those of abundant non-target microbes. Many tick species contain large quantities of symbiotic bacteria belonging to the same genera as TBPs such as Rickettsia amblyommatis in Amblyomma spp. or Francisella spp. in Dermacentor spp. In I. scapularis, 10 6 -10 7 Rickettsia buchneri may be present in a single female 51 and this symbiont commonly represents > 80% of 16S sequences in microbiome studies of female ticks 45,46,52 . The sheer quantity of genetic material from these bacteria can overwhelm less-abundant or rare sequences in methods used to evaluate the presence of multiple bacterial species within field sampled ticks. Thus, NAS via either genomic enrichment of pathogen targets or depletion of non-target organisms provide a valuable means to prevent output of unwanted sequence data.
We utilized nanopore MinION sequencing in combination with NAS to detect and molecularly characterize four bacterial TBPs (B. burgdorferi s.s.; B. miyamotoi; A. phagocytophilum; and E. muris eauclairensis) associated with eight female I. scapularis ticks. By enriching for whole genome assemblies of these pathogens, we confirmed that NAS was successful at targeting each detected TBP, resulting in an approximately two-fold enrichment of recovered single-molecule sequences over our control sequencing experiment ( Fig. 2; Supplementary Fig. S1). Using an alternate approach in which we used NAS to deplete our library for reads derived from both I. scapularis and R. buchneri, we were able to similarly detect TBP infections in individual tick samples. Importantly, this depletion approach has the potential advantage of being able to gather an "unbiased" view of metagenomic diversity across a variety of arthropod taxa for use in vector-borne pathogen discovery applications in which the species/strain of pathogen targeted may be unknown or poorly understood.
Our findings from both NAS enrichment and depletion experiments were supported by 16S Illumina microbiome sequencing data and an independent bioinformatic pipeline for classifying our nanopore reads. Importantly, nanopore sequencing enabled us to achieve a high level of taxonomic resolution in these samples to conclude that all eight ticks we sequenced were infected with B. burgdorferi s.s. and not the closely related species Lyme disease group spirochete B. mayonii. As two closely-related I. scapularis-transmitted pathogens, the Lyme disease agents B. burgdorferi s.s. and B. mayonii are separated by only ~ 5% genetic distance 53 . Despite their relatedness, our analyses and taxonomic classification of reads obtained by NAS were able to reliably differentiate between these bacterial species. Mapping of our long-read data against both bacterial pathogen genomes and classifying NAS reads individually using kraken2 both clearly suggested that B. burgdorferi s.s. was the sole Borrelia species present in our samples. Although PCR-based TBP surveillance approaches can easily provide similar species-level resolution, the NAS approach enabled us to generate large amounts of sequence data spanning entire bacterial TBP genomes. Using NAS, we noted the recovery of extremely long, intact DNA sequences associated with each bacterial TBP. For example, we received a single A. phagocytophilum read in excess of 26 kb, in addition to numerous > 5 kb reads across all detected TBPs (Fig. 3). Such long sequence lengths of individual DNA molecules secured using the NAS bioinformatic pipeline opens the door to genome-scale pathogen surveillance, an opportunity that is not easily explored using traditional methods.
Despite documenting the functionality of NAS for TBP research, we noted that numerous nanopore reads incorrectly mapped against the Ba. microti genome, a protozoal eukaryotic pathogen. Additional analysis found that these reads contained repetitive DNA sequences derived from the host Ixodes tick. Thus, we recommend caution when interpreting NAS sequence data wherein highly conserved or repetitive sequences have the potential to provide incorrect taxonomic assignment or spurious results. In addition, although our two-fold sequence enrichment provided considerable genomic information for targeted low-abundance TBPs, we also note that increased levels of enrichment are likely possible through a combination of extracting higher molecular weight genomic DNA and the loading of a larger sequencing library onto the MinION flow cell. Conversely, while sequencing at the level of the individual tick can provide detailed information about pathogen presence, microbial diversity and co-occurrence relationships 27,29 , it is likely not feasible nor cost effective for robust surveillance in many settings (e.g., an environment where pathogens are at very low prevalence). Fortunately, NAS-based surveillance approaches are widely scalable and can be adapted for use with pooled samples. These caveats notwithstanding, nanopore sequencing in combination with adaptive sampling showed considerable promise at detecting bacterial TBPs using unprocessed genomic DNA and limited sample processing steps. As "proofof-concept", our study assessed eight field-collected I. scapularis ticks using the MinION sequencing platform; however, we anticipate that with optimization of library preparation methods and performance improvements to the NAS pipeline, a substantially greater number of samples may be multiplexed on a single flow cell. We predict that, as technologies and computational analyses associated with nanopore sequencing and adaptive sampling improve, a multitude of exciting applications for real-time metagenomic pathogen biosurveillance will emerge, thus facilitating unprecedented insights into host-pathogen systems. www.nature.com/scientificreports/

Methods
Tick sample collection and nucleic acid extraction.. As part of larger sampling efforts, ticks were collected using a standard cloth dragging technique from five localities in Minnesota and Wisconsin between April 2017 and June 2019. Sampling consisted of dragging a light-colored 1 m 2 cloth over the ground and lowlaying vegetation to attract host-seeking ticks. Ticks attached to the drag cloth were removed using forceps and visually identified to species, developmental stage, and sex. Adult female I. scapularis specimens were selected from each locality sampled to be included in this study. Surface contaminants were removed by sequentially washing ticks in Tween 20 and 0.5% benzalkonium chloride solutions; ticks were next rinsed with distilled water and 70% ethanol, and stored in fresh 70% ethanol prior to nucleic acid extraction. To promote effective tissue lysis, whole ticks were bisected using a sterile 16-gauge needle and incubated in a lysis buffer solution (i.e., 180 μl Qiagen Buffer ATL and 20 μl proteinase K) at 56 °C overnight. The resulting digestion mixture was processed using a DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) following manufacturer instructions. DNA extracts were quantified using a Qubit 4 fluorometer (Invitrogen, Carlsbad, United States) and visualized by agarose gel electrophoresis. primers. An initial qPCR step was performed using the Meta_V4_515F and Meta_V4_806R primer pair, and consisted of an initial 95° C denaturation for five minutes, 35 cycles of 98° C denaturation for 20 s, 55° C annealing for 15 s, and 72° C extension for 60 s, and concluded with a final extension for five minutes at 72° C. These data were used to normalize each tick template sample to approximately 167,000 molecules/μl. Next, a conventional PCR using the same primer pair and normalized template was conducted using the following parameters: initial denaturation at 95° C for 5 min; 25 cycles of 98° C denaturation for 20 s, 55° C annealing for 15 s; and 72° C extension for 60 s; followed by a 72° C final extension for 5 min.  Table S1). As a control, the paired library was sequenced without adaptive sampling enrichment. The third sequencing experiment was performed using the NAS depletion approach with Tick 5-Tick 8 (Fig. 1B). For this experiment, the I. scapularis genome (NCBI Assembly: GCA_016920785.2) was used as a reference for real-time host depletion with the NAS method. In addition, the complete genome assembly for R. buchneri (GCF_000160735.1) was concatenated to the reference to provide supplemental depletion for this highly abundant, non-pathogenic bacterial tick symbiont. For all libraries, real-time alignment using the TBP enrichment reference file was enabled in MinKNOW to allow mapping of reads against our target TBP genomes through minimap2 44 . Reads initially mapped during sequencing in real-time were basecalled using the fast basecalling model in Guppy. Each sequencing experiments was allowed to proceed for roughly 24 h to allow each library to be sequenced to near completion. For all downstream analyses, raw reads in Fast5 format were re-basecalled and demultiplexed using Guppy, v4.2.2, in GPU mode using the high accuracy (HAC) basecalling model. All raw nanopore sequence data generated herein are available via the NCBI SRA under BioProject ID PRJNA790938; BioSamples SAMN24251598-SAMN24251609.
Bioinformatic processing. Paired-end raw 16S reads generated on the Illumina MiSeq platform were demultiplexed and Nextera adapters were trimmed. The resulting reads were processed using the DADA2 pipeline (https:// github. com/ benjj neb/ dada2) to taxonomically assign 16S amplicons to the level of the bacterial genus. The resulting amplicon sequence variant file was queried for pathogenic agents of interest (e.g., Anaplasma, Borrelia (relapsing fever-associated species), Borreliella (Lyme disease-associated Borrelia species), Ehrlichia) to determine pathogen presence and the relative abundance of raw V4 16S reads for each sample.

Data availability
All raw data from ONT MinION and Illumina MiSeq sequencing experiments described here are deposited in FASTQ format within the NCBI Sequence Read Archive (SRA) repository under BioProject ID PRJNA790938. Data generated through NAS enrichment, NAS depletion, and control sequencing experiments are accessioned under the following BioSamples: SAMN24251598 -SAMN24251609. BioSample numbers for the corresponding paired-end Illumina MiSeq 16S data are accessioned as follows: SAMN24251610 -SAMN24251617.